Linear Regression and Uncertainty

Carolina Torreblanca

University of Pennsylvania

Global Development: Intermediate Topics in Politics, Policy, and Data

PSCI 3200 - Spring 2025

Agenda

  • Parameters vs. Estimates
  • Confidence Intervals
  • The Parameters in Linear Regression
  • Quantifying Uncertainty in Linear Regression

From sample to population

  • Often we want to know some characteristic about a population of interest
    • We call this characteristic a “parameter”
  • But we only have a sample of that population
    • We call this the “estimate”
  • How do we make inferences about the population parameter with what we learn from our estimate?

Empirical Example: Brexit

  • Data from the British Electoral Survey

  • We want to learn the probability that a Brithis citizien supports Brexiting in the country.

  • Notice that such probability is the same in expectation as the proportion of pro-brexiting citizens

  • We have a random sample of British citizens who were surveyed

Empirical Example: Brexit

Show code
brex <- read.csv(here::here("./slides/code/BES.csv"))
str(brex)
'data.frame':   30895 obs. of  4 variables:
 $ vote     : chr  "leave" "leave" "stay" "leave" ...
 $ leave    : int  1 1 0 1 NA 0 1 1 1 1 ...
 $ education: int  3 NA 5 4 2 4 3 2 3 4 ...
 $ age      : int  60 56 73 64 68 85 78 51 59 68 ...
Show code
table(brex$vote, useNA = "always")

   don't know         leave          stay wouldn't vote          <NA> 
         2314         13692         14352           537             0 
Show code
brex$exit <- ifelse(brex$vote=="leave", 1, 0)
prop.table(table(brex$exit))*100

       0        1 
55.68215 44.31785 

Empirical Example: Brexit

  • “Support of Brexit” is a Random Variable:

    • \(Support \sim \text{Bernoulli}(p)\)
  • With a probability mass function:

\[\begin{align*} f(k; p) = \begin{cases} p & \text{if } k=1, \\ q = 1 - p & \text{if } k=0. \end{cases} \end{align*}\]
  • Where \(E(Support) = p\) and \(Var(Support) = p(1-p)\)

Empirical Example: Brexit

  • We want to know what \(p\) is!

  • But it is a population parameter and we only have a sample

  • We can estimate \(\hat{p}\) by doing

phat <- mean(brex$exit)
phat
[1] 0.4431785
  • But if our sample were slightly different, we would have gotten a different \(\hat{p}\)

Sampling Distribution of \(\hat{p}\)

Show code
require(tidyverse)
set.seed(7)

out.means <- c()

for (i in 1:1000) {
  temp_dat <- sample_n(brex, nrow(brex), replace = T)
  out.means[i] <- mean(temp_dat$exit)
  rm(temp_dat)
}

hist(out.means)
abline(v = mean(out.means), col = "red", lwd = 2, add = T)

Sampling Distribution of \(\hat{p}\)

  • That’s just a normal distribution!

  • All normal distributions can be described by their mean and their standard deviaton

  • This one is called “sampling distribution of the sample mean”

    • Centered around our estimate
    • \(SE = \sqrt{\frac{Var (Support)}{n}}\)
  • Knowing it is a normal distribution helps us quantify the uncertainty in our estimates

Standard Errors

  • The standard deviation of the sampling distribution of an estimator is called “standard error”

  • One interpretation: “How off are our estimates, on average, from the true population parameter?”

  • By calculating the standard error we can know the shape of the sampling distribution. This helps us do two important things:

    1. Construct confidence intervals (what is the range within which the true value is likely to be?)
    2. Do hypothesis testing (p-values and statistical significance)

Confidence Intervals

  • Range of values that likely includes the true value of our parameter of interest

  • Specifically, the range that includes a pre-specified proportion of the density of the sampling distribution of our sampling distribution

  • Interpretation: “With X% confidence, I know my true parameter is within the confidence interval”

    • Confidence is NOT probability!

Confidence Intervals

  • E.g: Because of the properties of the normal distribution, we know that 95% of the density will be within the following range:

\[\begin{align*}\small CI_{95\%} = \hat{p} - 1.96 \times \sqrt{\frac{Var (Support)}{n}},\\ \hat{p} + 1.96 \times \sqrt{\frac{Var (Support)}{n}} \end{align*}\]

  • Interpretation: With 95% confidence, the true value of \(p\) is within that interval

Example

Show code
# standard deviation of the sampling distribution computed with the formula
se <- round(sqrt(var(brex$exit)/nrow(brex)),3)
# A data-driven confidence interval
quantile(out.means, c(.025, .975))
     2.5%     97.5% 
0.4376428 0.4487458 
Show code
# An analytic solution to the confidence interval
(ci_95 <- c(phat - (1.96*se), phat + (1.96*se)))
[1] 0.4372985 0.4490585

Linear Regression

  • We can think of the parameters of a linear regression in the same way. Imagine there exists a population-level linear relationship between X and Y:

\[\begin{equation*} Y_i = \alpha + \beta X_i + \varepsilon_i \end{equation*}\]

  • \(\alpha\) an intercept, common to all units.

  • \(\beta\) the slope, common to all units.

  • We need to estimate these parameters with our sample

  • We fit the line that minimizes the error in prediction

Linear Regression

Show code
# Simulated data
set.seed(8)
alpha <- 5
beta <- -.03
x <- rnorm(1000, 4, .8) # 
error <- rnorm(1000, 0, 1)

# relationship is linear by construction!!
y <- alpha + (beta*x) + error

plot(x, y, main = "Scatterplot with Best Fit Line", 
     col = "gray80", pch = 16)
# Fit linear model
model <- lm(y ~ x)
# Add best fit line
abline(model, col = "red", lwd = 2)

Linear Regression

  • How do we interpret this?

  • Why are \(\hat{\beta}\) and \(\hat{\alpha}\) different from \(\alpha\) and \(\beta\)?


Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.8226 -0.6939  0.0102  0.6896  3.3679 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.17773    0.15841  32.686   <2e-16 ***
x           -0.06877    0.03911  -1.758    0.079 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.011 on 998 degrees of freedom
Multiple R-squared:  0.003089,  Adjusted R-squared:  0.00209 
F-statistic: 3.092 on 1 and 998 DF,  p-value: 0.07899
          x 
-0.06877102 
[1] 0.03911026

Linear Regression

  • Estimates of \(\hat{\beta}\) and \(\hat{\alpha}\) are uncertain

  • They have their own sampling distributions!

    • CLT: They are also normal
  • We can use what I know about normal distributions to quantify their uncertainty

  • We can construct confidence intervals in the exact same way!

  • Or do hypothesis tests

Hypothesis Testing and P-values

  • We are often interested in determining whether the true parameter is different from zero with a pre-specified level of confidence

\[\begin{align*} H_0: \beta = 0 \\ H_1: \beta \neq 0 \end{align*}\]

  • We are going to reject \(H_0\) in favor of \(H_1\) if we are sufficiently confident we aren’t making a mistake

  • That is, the estimate is far enough from zero, both directions, that we have at least 95% confidence that it’s different from zero

P-value

  1. Assume the true parameter is centered around a mean of 0

  2. “Draw” the sampling distribution of the parameter

    • Remember we know that its sd = se
  3. Calculate the probability of observing an estimate at least as extreme as the one you observed if the true parameter is zero

  4. If you are doing a two-tailed test, use absolute value

P-value

  • Simulated data to visualize the p-value
        x 
0.0786815 

Statistical Significance

  • If you are using a level of statistical significance of 95%, you reject \(H0: \beta = 0\) if \(p-value \leq .05\)

  • If you are using a level of statistical significance of 99%, you reject \(H0: \beta = 0\) if \(p-value \leq .01\)

  • When we have estimates with a p-value less or equal to that, we say our coefficient is “statistically significant”

  • It just means we are sure enough the parameter is different from zero

  • In papers, they report this with different number of stars!

Summing Up: Statistical vs. Scientific Significante

  • Statistical significance is NOT a measure of importance

  • Statistical significance just means an effect or difference is likely NOT to be zero

  • But it might still be small enough to be insignificant

  • The more observations we have, the better “powered” we are to detect small effects